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Abstract: The Frechet mean or barycenter generalizes the idea of averag- 
ing in spaces where pairwise addition is not well-defined. In general metric 
spaces, however, the Frechet sample mean is not a consistent estimator of 
the theoretical Frechet mean. For non-trivial examples, the Frechet sample 
mean may fail to converge. Hence, it becomes necessary to consider other 
types of convergence. We show that a specific type of almost sure (a.s.) 
convergence for the Frechet sample mean introduced by Ziezold (1977) is, 
in fact, equivalent to the consideration of the Kuratowski outer limit of 
a sequence of Frechet sample means. Equipped with this outer limit, we 
prove different laws of large numbers for random variables taking values in 
a separable (pseudo-)metric space with a bounded metric. In this setting, 
we describe strong laws of large numbers for both the restricted and unre- 
stricted Frechet sample means of all orders, thereby generalizing Ziezold's 
original result. In addition, we also show that both the restricted and unre- 
stricted Frechet sample means are metric squared error (MSE) consistent. 
Interestingly, we derive a simple upper bound for this MSE, which is com- 
posed of the Frechet variance of the estimator and a bias term, thereby 
generalizing the classical decomposition of the mean squared error for esti- 
mators of real- valued random variables. 

AMS 2000 subject classifications: Barycenter, Centroid, Consistency, 
Estimation theory, Equicontinuity, Frechet mean, Frechet variance, Karcher 
Mean, Metric space, Metric squared error. Point function. 

1. Introduction 

All statistics are summaries. The epitome of these summaries is the sample 
mean, and its theoretical analog, the expected value. In an inspired monograph, 
Frechet (1948) generalized this concept to any abstract metric space. He showed 
that the sole requirement for the definition of a mean element is the specification 
of a metric on the space of interest. Once this metric has been chosen and a 
probability measure has been defined on that metric space, the Frechet mean is 
simply the element that minimizes the sum of the squared distances from all the 
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elements in that space. The Prechet mean generaUzes other notions of means in 
abstract spaces, such as the centroid in Euchdcan geometry, the baryccntcr or 
center of mass in physics, the Procrustean mean in shape spaces (Le, 1998), and 
the Karcher mean on Riemannian manifolds (Karcher, 1977). The sample ver- 
sion of the Frcchct mean can naturally be expressed using cumulative addition 
instead of the expectation, thereby producing a convex combination operator 
on metric spaces of both negative and positive Alexandrov curvature (Ginestet 
et al., 2012). 

The object of this paper is to characterize the asymptotic behavior of the 
Frechet sample mean in separable bounded metric spaces, where a bounded 
metric space is a metric space with a bounded metric. Separability is a rel- 
atively mild topological assumption likely to be satisfied in most applications. 
The boundedness of the metric, however, is a more stringent condition. Nonethe- 
less, there is a range of modern statistical applications for which the metric of 
interest is likely to be bounded. In bioinformatics, the use of the Hamming 
(1950) distance on finite alphabets, such as stretches of DNA for instance, nat- 
urally gives rise to such assumptions (He ct al., 2004). Similarly, the comparison 
of families of networks with a given number of nodes, as commonly done in neu- 
roscience (Ginestet et al., 2011) may similarly generate bounded metric spaces; 
albeit the combinatorial nature of these metrics may lead to bounds that in- 
crease factorially with the number of nodes in these networks. The Frechet mean 
has proved to be especially popular in machine learning, where it has been ap- 
plied to spaces of probability measures to facilitate clustering (Lee et al., 2007) 
and to spaces of images (Davis and Lazebnik, 2008, Gerber et al., 2009). 

The asymptotic properties of the Frechet sample mean have been studied by 
several authors. Ziczold (1977) proved a strong law of large numbers for Frechet 
sample means defined in separable bounded quasi-metric spaces, where the met- 
ric is not assumed to satisfy the coincidence axiom. This a.s. convergence result 
has also been demonstrated for compact metric spaces by Svcrdrup-Thygeson 
(1981). The perspectives adopted by these two authors are very different in na- 
ture. Given the fact that Sverdrup-Thygeson (1981) does not cite the work of 
Ziczold (1977), and because the work of the latter was published in a conference 
proceedings, it is probable that Sverdrup-Thygeson (1981) was not cognisant of 
Ziezold's proof technique. 

The result due to Ziczold (1977) is stronger than the one due to Sverdrup- 
Thygeson (1981). By the Heine-Borel theorem, a metric space is compact if, 
and only if, it is complete and totally bounded. The latter condition implies 
that every compact metric space has finite diameter, and therefore constitutes 
a bounded metric space. (Alternatively, using the continuity of the metric func- 
tion, observe that the continuous mapping of a compact space is itself compact.) 
The converse, however, does not hold. A bounded metric space need not be 
compact: One can transform any metric space into a bounded metric space, by 
adopting the discrete metric (i.e. d{x,y) = 1 if a; and y are identical and 0, 
otherwise). In general, an infinite set endowed with the discrete metric will be 
bounded, but not totally bounded, in the sense that it may not be possible to 
cover such a space with a finite number of balls of finite diameter. 
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The properties of sample Prechet means on Riemannian manifolds have been 
particularly wcll-studicd (Bhattacliarya and Patrangenaru, 2002, 2005, Bhat- 
tacharya and Bhattacharya, 2012). When the Frechet mean is assumed to be 
unique, the theorem of Sverdrup-Thygeson (1981) has been generalized by Bhat- 
tacharya and Patrangenaru (2003) for proper metric spaces. Recall that a metric 
space is proper, if and only if every bounded closed subsets of that space is com- 
pact (Sahib, 1998, Yang, 2011). By the Hopf-Rinow theorem, every complete and 
connected Riemannian manifold is a proper metric space. Thus, Bhattacharya 
and Patrangenaru (2003) have weakened the compactness assumption made by 
Sverdrup-Thygeson (1981), and their strong law of large numbers apply to man- 
ifolds, imdcr some very mild conditions. Recently, Kendall and Lc (2011) have 
further generalized these results with a weak law of large numbers and a central 
limit theorem for sequences of Prechet sample means based on non-iid random 
variables taking values on a Riemannian manifold. 

Here, we position ourselves in the general setting of Ziezold (1977), where we 
are not assuming any existing smooth structure. We will consider sequences of 
random variables taking values in separable quasi-metric spaces with a boimded 
metric. We generalize the seminal result of Ziezold (1977) to Frechet sample 
means of any orders, and to restricted Prechet sample means. The restricted 
Frechet sample mean is the most 'typical' quantity chosen from the available 
sampled values. The computation of the unrestricted Frechet sample mean in 
arbitrary metric spaces can indeed prove to be arduous, since this necessarily 
requires a minimization over a complex space. The difficulties that arise when 
estimating the Frechet mean in shape spaces, for instance, have received special 
attention (Dryden and Mardia, 1998, Kume and Le, 2000, Le, 2001, 2004). Esti- 
mation issues have also been addressed in spaces of covariance matrices, where a 
range of different metrics can be considered (Arsigny et al., 2007, Dryden et al., 
2009, Yang et al., 2011). The use of the restricted Prechet mean may therefore 
be useful in practice, as it greatly simplifies the minimization procedure. 

Importantly, we also clarify previous results on the asymptotic consistency of 
the Prechet sample mean, by showing that the modes of convergence studied by 
Ziezold (1977) and Sverdrup-Thygeson (1981) are, in fact, equivalent to the con- 
sideration of the Kuratowski outer limit of a sequence of Frechet sample means. 
This straightforward reformulation directly leads to a proof of the convergence 
of the Frechet sample mean in metric squared error (MSE) to its theoretical 
analogue. Of independent interest is the fact that this MSE can be bounded 
above by the sum of the Prechet variance of the estimator of interest and a bias 
term, which therefore provides a generalization of the classical decomposition 
of the mean squared error for real- valued random variables. 

One of the core difficulties with the consideration of the asymptotic properties 
of Frechet sample means is that such functions can be multivalued. That is, when 
the Frechet sample mean is not unique, we obtain a random variable that is a 
set- valued function, which takes values in the power set of Af, or more precisely 
in the Borcl cr-algebra of X. It then becomes necessary to consider the conver- 
gence of multivalued functions. To this end, we resort to the tools of set-valued 
analysis, as described by Aubin and Prankowska (2009). This difficulty leads us 
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to consider different 'types' of convergence, depending on whether we require 
the Frcchet sample mean to converge, or arc simply interested in evaluating the 
asymptotic behavior of the outer limit of that sequence. 

The main innovation in this paper is our formal set-valued perspective. Note 
that our approach differs from the one of Bhattacharya and Bhattacharya 
(2012), since we have allowed the metric spaces of interest to be non-compact, 
and not necessarily manifolds. In particular, we identify the key role played by 
the Kuratowski outer limit in the sequence of Frcchet sample means. This paper 
therefore constitutes an extension of the work of Ziezold (1977) and Sverdrup- 
Thygeson (1981) to Frechet means of all orders, and to restricted Prechet mean. 
Moreover, we have emphasized the importance of point functions and of the 
Glivenko-Cantelli lemma. By positioning ourselves in bounded metric spaces, 
we do not have the nearest-point property, and therefore there is no guarantee 
that the Frcchet mean sets arc closed sets in these spaces of interest. Some care 
must therefore be taken when evaluating the convergence of such sequences of 
sets. 

This paper is organized as follows. In section 2, wc introduce and study 
different types of a.s. convergence for sequences of Frechet sample means, and 
show through counterexamples that the Kuratowski outer limit is better suited 
for this purpose. In section 3, wc generalize the strong law of Ziezold (1977) to 
Frechet sample means of all orders. Section 4 is devoted to the description of 
the restricted versions of the Prechet sample mean, and a generalization of a 
result due to Svcrdrup-Thygeson (1981) to boimdcd metric spaces, for random 
variables with closed support. In section 5, we derive equivalent results in terms 
of MSE consistency. 

2. Sequences of Prechet Sample Means 

2.1. Empirical and Theoretical Frechet Means 

A separable space X is endowed with a metric d : X x X M+ . This produces 
a metric space, {X,d), with elements x. Let a probability space be denoted by 
(ri,J-", P), and define a random variable, X, on that space, which takes values 
in {X,B). Here, B is the Borel cr-algebra generated by the topology, r on X, 
induced by d. The triple (O, J',P) is assumed to be complete, in the sense that 
every subset of every null set is measurable. This is particularly convenient for 
constructing product spaces based on fl that remain well-behaved. In addition, 
we define /u(-B) := {FoX~^){B), for every B G B. Naturally, X is here assumed 
to be (J^, yB)-measurable. Such a random variable will be termed an abstract- 
valued random variable, which will be contrasted with the more standard real- 
valued random variables. 

In this setting, wc compute the most ^centraV element. This is the element 
that has the smallest expected distance to all other elements in X . This approach 
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allows us to define the following moments (Prechet, 1948), 



:= arginf / d{x,x'y dii{x), and := inf / d{x,x'Ydii{x), (1) 
x'eA- J x'^x J 

X X 

for every < r < oo, and where C X. Observe that we are using the 
superscript r on the Prechet variance as a simple marker of the order of the 
exponentiated metric. Thus, in general, it will not be true that (ct'")^/'' simplifies 
to cr^ 

These are commonly referred to as the Prechet mean and variance when 
r = 2. For other choices of r, wc will refer to these different Prechet moments 
as Prechet moments of order r. Note that if the infimum of E[d{x,x'y] exists, 
then it is unique. However, the argument of the infimum may not necessarily 
exist and may not be unique. If such an argument does not exist, then = 0. 
When the minimizer is not unique, the ensemble of minimizers is sometimes 
referred to as the Frechet mean set. In particular, observe that if O is not a 
singleton, cr^ — 'E[d{X, 6*)^] for any 9 E Q, will not, in general, be equivalent to 
E[d(X, 6)^], where the distance between an element x and a non-empty subset 
^ of A" is defined as d{x,A) := m{{d{x,y) : y G A}, with d{x,0) = oo. In this 
paper, Prechet mean and Prechet mean set will be used interchangeably. Observe 
that when X is a, Hilbert space, endowed with the inner product metric, then 
there exists a unique global minimizer and O is therefore a singleton. 

Analogously, for a given sequence of abstract-valued random variables Xi : 
fl 1-^ X, for every i = 1, . . . ,n, one may define the following Prechet sample 
moments of the r*^ order 

e: := arginf -VdfXi, a;')'' and := inf - V a;')^ (2) 

Observe that, even for the sample versions of the Prechet moments, these infima 
meed not be attained, and therefore these quantities may be empty for each n. 
When there is no ambiguity as to the order of O^^, wc will simply refer to this 
quantity as G„, and similarly for O. In the sequel, an element of O and an 
element of G„ will be respectively denoted by 6 and Our interest will mainly 
lie in considering Frechet moments of the second order, albeit some examples 
will also be studied where r = 1. In general metric spaces, the empirical and 
theoretical Predict means may not be closed. (Take, for instance, the space 
X := {—2} U (--1, 1) U {2}, with 1/2 point masses at —2 and 2.) However, It is 
easy to see that the Prechet mean and Prechet sample mean are closed subsets 
of X, if X is PoUsh. 

Lemma 1. For any metric space {X,d), and the ©J^'s are closed. 

Proof. Clearly, if = 0, then cl(©'') = ©"" and similarly for the ©^'s. Now, fix 

r = 1, and consider the Frechet mean set OCX. Recall that the boundary of 
© is defined as 8(0) := {x € X : d{e,x) = d(©<=', a;) = 0} , where ©^ := A" \ ©. 
We proceed by contradiction. Assume that € d{Q) and 9o ^ ©, then it 
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follows that there exists 6 G Q, such that by the triangle inequality, d(Oo,X) < 
d{6o, 6) + d{6, X), for every X & X. Taking the expectation, this gives 

E[d(6'o,X)] < d(6'o,6i) +E[d(6»,X)] = inf ¥.[d{X,x% 

x' £X 

since d{0Q,Q) = 0, and using the definition of 6 in equation (1). Thus, is 
optimal with respect to the infimum over X. However, we have assumed that 
Oq ^ G, which leads to a contradiction, and therefore 9(8) C Q. 

Next, consider the case of r > 1. Through a classical result on metric spaces 
(see, for instance Frechet, 1948, p. 229), we have 

{^[d{eo,xY]f'^ < (^[d{eo,er]f'' + (^[d{e,xY]f'\ 

for every r > 1, and the result immediately follows, using the same argument. 
The proof is identical for the 6^'s. □ 



2.2. Convergence of Frechet Sample Mean Sets 



In this section, we study and compare different modes of convergence for set- 
valued random variables. In particular, note that our chosen modes of con- 
vergence differ from the ones used by Bhattacharya and Bhattacharya (2012), 
since we are not here assuming the compactness of the underlying metric space 
X. Moreover, the target Frechet mean set is also allowed to be empty, thereby 
making it difficult to implement the methods of Bhattacharya and Bhattacharya 
(2012). 

For the Frechet sample mean and its theoretical analogue, a.s. convergence 
could be defined in {X, d) using sequences of random sets as follows, 

{w G 17 : e„(w) ^ e} =1, (3) 

where observe that Q is here treated as a fixed subset of X. The event in 
equation (3) will have probability one if the sequence of random sets, denoted 
0„, converges a.s. in a set-theoretical sense such that 

liminf 0„(w) = limsupO„(w) = 6, (4) 

for almost every w G fi, and where liminfS'„ := U,^i nm=n "^"i ^^"^ 
limsup Sn '■= Um=n denote the standard inner and outer limits of a 

sequence of subsets of X. For most purposes, however, this type of convergence 
is too strong. In fact, this criterion does not hold for Frechet sample means de- 
fined with respect to general abstract- valued random variables. There are many 
non-trivial examples of sequences of Frechet sample means that diverge. Con- 
sider the following example adapted from the three-dimensional case described 
by Sverdrup-Thygeson (1981). 
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Fig 1. Metric and measure spaces considered in examples 1 and 2. In both panels, the 
closed interval [—1,1] is equipped with the Manhattan (or taxicab) metric, and two 
point masses are specified at —1 and 1. Different Frechean inferences are conducted 
by taking r = 1 and r = 2 in panels (a) and (b), respectively. In the first case, the 
theoretical Frechet mean coincides with the median of X, whereas in panel (b), the 
theoretical Frechet mean coincides with the arithmetic mean. However, the sequence 
of Frechet sample means diverge in both cases, when convergence is evaluated using 
set-valued liminf and limsup, as described in equation (4). 



Example 1. Let the interval, X :— [—1,1] C M, and equip this set with the 
usual 'Manhattan' distance, defined as d{x,y) := \x — y\ for every x,y G X. 
Additionally, let the random variable X, vifhich takes values in X, and which 
satisfies the following P [X = —1] = P [X = 1] = 1/2. This construction is illus- 
trated in panel (a) of figure 1. The theoretical Frechet mean of order r = 1 can 
be readily found as 

= arginf ^ d{x, x)¥[x] = X, 

"'^■^ ..£{-1,1} 

since the energy function satisfies £{x') :— ^ x')P[a;] — 1 for every x' e X. 
Here, the Frechet mean defined with respect to the Manhattan distance coincides 
with the median of the real-valued random variable X (Feldman and Tucker, 
1966). 

For the empirical Frechet mean, 9^, first compute Sn '■— J^^^i-^i- Clearly, 
the Sn's are integer- valued. Observe the correspondence between the values of 
Sn and the values taken by the Frechet sample mean. If the event {Sn = 0} 
occurs, then it can easily be seen that 0„ is equal to X. Similarly, {Sn > 1}, 
and {Sn < —1} respectively imply that 0„ = 1 and 0„ = —1. Now, 

p[{^2„ = o}] = (';) 

for every n, using Stirling's approximation. Since P [{5„ = 0}] is null, when n 
is odd, it follows that ^ [{^n = 0}] < oo, and therefore by the Borel- 

Cantelli lemma, we have P [{Sn = 0} i.o.] = 0, where i.o. means infinitely often. 
This implies that P[{ 9„ — X} i.o.] — 0, and hence limsup 6„ ^ X. 
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By using a similar argument, one can observe that P[{ < —1} i.o.] = 
P[{ Sn > 1} i.e.] = 1, which impKes that P[{ 0n = -1} i.o.] = P[{ On = 1} i.o.] = 
1, and therefore {—1, 1} is the limit superior of the sequence of Frechet mean 
sets. By contrast, there does not exist an > 0, such that 9n = 1, for every 
n> N . An identical statement holds for On = —1, and therefore the limit inferior 
of 0„ is empty. Thus, 

limsup©„(a;) = {—1, 1} D liminf 0„(a;) = 0, 

n— >oo n—>oo 

and the sequence of Frechet sample means diverges, as criterion (4) is not sat- 
isfied. 

Remark 1. The preceding example highlights two important aspects of the 

asymptotic behavior of the Frechet sample mean set. Firstly, the Frechet sample 
mean will in general fail to converge in the sense that its outer and inner limits 
need not be identical. In such cases, the sequence of Frechet sample means 
exhibit an oscillatory property (see Feldman and Tucker, 1966). Secondly, the 
limit superior of a sequence of Frechet sample means may solely represent a 
subset of the theoretical Frechet mean. Taken together, these two problems 
necessitate (i) the study of the asymptotic behavior of the outer limit of the 
6„'s, and (ii) the consideration of the convergence of the the Frechet sample 
mean in terms of set inclusion, as a subset of the theoretical Frechet mean. The 
passage from equations to inclusions is a natural step in the generalization of 
singleton-valued analysis to set-valued analysis. 

Example 1 leads to the formulation of a weaker type of convergence, which 
can be expressed as the probability of the following event, 

< w e : limsup 6„(u;) C 6 I . (5) 

I, n— >oo J 

However, we here encounter a slightly different problem than the one highlighted 
in our first example. This second issue can be illustrated through another coun- 
terexample, which shows that this particular type of a.s. convergence does not 
agree with the analogous real-valued a.s. convergence. That is, the reformulation 
of a given real-valued random variable into an abstract-valued setting, equipped 
with the same topology produces a divergent Frechet sample mean in terms 
of equation (5). As a result, we obtain the somewhat counterintuitive result 
that the arithmetic sample mean differs from the corresponding Frechet sample 
mean. 

Example 2. Consider the same setting described in example 1, where now 
r = 2 (see panel (b) of figure 1). One can immediately see that the theoretical 
Frechet mean is a singleton set, 

e2 = arginf ^ d{x,x'f^[x]=Q, 
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which coincides with the expected value of the real-valued random variable X. 
For the Frcchct sample mean, we know from example 1 that P[{S'n = 0} i.o.] = 
and therefore the probability of the sequence of empirical Frechet means includ- 
ing ¥\X] infinitely often is null. That is, for r = 2, we have T[{6n = 0} i.o.] = 0. 
Observe that the same is true for any other specific sequence of realizations of 
X. Consider the case of S^n = nxi + 2nx2, where X\ = —\ and Xi = \. For this 
subsequence, there exists a unique infimum, which is ^„ = 1/3. The probability 
of this event occurring is as follows, 

P [{53„ = nxi + 2nx2}] = i^"^ (^^y" « (1/2)5", 

which was approximated using Stirling's formulae. Clearly, all possible values 
of the Frechet sample mean of X can be represented as a formulae of the form 
nxx^anxi^ for some a G N. Using the Borel-Cantelli lemma, it therefore follows 
that there does not exist a point in [—1, 1] that Qn will visit infinitely often, and 
hence limsup0„ = liminf 8„ = 0. By contrast, the arithmetic sample mean, 
Xn '■= n~^J2^=i^i trivially converges to the expected value of X a.s., since 
for every e > 0, there exists an A'' > 1, for which d(X„(a;),E[X]) < e, for 
every n > N, for almost every uj G ft. Thus, for this example, we reach the 
counterintuitive conclusion that Xn ^ limsupO„, for every n. 

This paradoxical disagreement between the divergence of the Frechet sample 
mean and the classical convergence of the arithmetic sample mean in such a 
simple example requires a strengthening of our definition of the a.s. convergence 
of Qn- This particular problem seemed to have been implicitly identified by 
Ziezold (1977), as this author proposed the following type of convergence, which 
specializes the event presented in equation (5), 



^ w e 1^ : fl y @m{^) C e i , (6) 

I, n=l m=n ) 

where A indicates the closure of set A in X. For convenience, this particular 
type of convergence will be denoted by limsup 0„ C 0, a.s., where the limsup 
operator is here defined with respect to set inclusion on the power set of X. 
It is easy to see why definition (6) resolves the issue illustrated in example 2. 
By taking the closure of Um=n ^® include all the elements for which there 
exists a sequence of ^„'s converging to ]E[X], and therefore 



Tn=n 

for every n, which implies that limsup 0„ = {E[X]}, as desired, thereby ensuring 
complete agreement between the classical and Frechet inferential approaches 
for this particular example. Note that these issues are neither related to the 
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completeness of the underlying space of interest, nor associated to the question 
of the non-emptiness of 0. 

Since Sverdrup-Thygcson (1981) assumed that X is compact, it follows that 
G and 0„ are non-empty, in this case. The separability of X is not sufficient to 
ensure that O and the 0„'s arc non-empty. Nonetheless, observe that if G„ = 0, 
then the events in equations (5) and (6) are trivially almost certain, since C A, 
for all AC X, as originally observed by Ziezold (1977). 

2.3. Kuratowski Upper Limit 

It can easily be shown that the type of convergence envisaged by Ziezold (1977) 
is, in fact, equivalent to the celebrated upper limit introduced by Kuratowski 
(1966), which has been adopted as the preferred type of convergence in set- 
valued analysis (see Aubin and Prankowska, 2009). The Kuratowski upper limit 
is defined over a metric space {X,d), for some sequence of subsets A„ C X, as 
follows 

Limsup j4„ := \x G X : liminf d{x, An) = 0\ 

= ja; e : {A„ n N^{x) ^ 0} i.o., V e > o|, 

where liminf and Limsup are taken with respect to real numbers and subsets 
of X, respectively, and with N^{x) := {x' <E X : d{x,x') < e}. The second 
formulation of Limsup in equation (7) immediately follows from the positivity 
of the metric. Also, observe that the Kuratowski upper limit is equivalent to 
the set of cluster points of the sequences, a:„ €E A„ (Aubin and Frankowska, 
2009) Clearly, the Kuratowski upper limit of any sequence of sets is closed, and 
moreover, it contains the conventional set-theoretical upper limit, such that for 
any sequence of random sets An, 

limsup A„ C Limsup 

n—^oo n—^oo 

Importantly, it can be easily shown that the Kuratowski upper limit and the 
quantity studied by Ziezold (1977) are equivalent, as stated in the following 
lemma. 

Lemma 2. Given a metric space {X, d), for any sequence of sets An C X , 

limsup A„ = Limsup 

Proof. Clearly, limsup^„ = 0, if and only if, Limsup A„ = 0. Thus, assume 
that these two outer limits are non-empty, and choose xq G limsup A„. Then, 
^0 G Um=Ar -^m for every N and there exists a subsequence Xk such that Xk S 
An^, for every k, which satisfies Xk — a;o. Hence, we have liminf d(xo, A„) = 0, 
and by definition (7), limsup A„ C Limsup 

Conversely, choose xq G Limsup Then, there exists a subsequence Xk such 
that Xk € An^. r\N^{xo), for every k and for every e > 0, which satisfies Xk xq, 
10 



(7) 



as fc — >■ c». Clearly, such subsequences can be found f or every e N, such that 

ni > N. This immediately implies that xq E n?^=i Um=JV ^^'^ therefore 
limsupA„ 3 Limsupyln, which completes the proof. □ 

Observe that LimsupA„ can be empty. Consider the following diverging 
sequence of sets, A„ := [n — l,n + 1], for every n. It is immediate that 
LimsupA„ = 0. Throughout the rest of the paper, we will neither assume the 
existence nor the uniqueness of 6'' and the OJj's. In particular, in the sequel, 
may be either empty, a set or a singleton set. 

3. Almost Sure Consistency of Frechet Sample Mean 

In this section, we describe a generalization of the strong law of large numbers 

due to Ziczold (1977) to Frechet sample means of any order. This generalization 
also allows us to re-formulate this original result using the Kuratowski upper 
limit. 

Theorem 1. Given a probability space (fi, J", P) and a separable bounded met- 
ric space {X,d), let Xi, . . . , Xn be a sequence of independent and identically 
distributed (iid) abstract-valued random variables, such that Xi : Vl ^ X , for 
every Xi . Then, 

CT^ — >■ a.s., and Limsup 8^ C Q^' a.s., 

for every finite r > 1, and where Limsup is defined as in equation (7). 

The particular mode of convergence of the Frechet sample mean used in 
theorem 1 will sometimes be denoted by X^ X, which implies that 
Limsup X„ C X with probability one. 

Remark 2. The integrability of the r*^ order metric is implied by the finiteness 
of both d and /i. Since d{x, y) < M, for every x,y G X,we have for any arbitrary 
a £ X and finite r > 1, 

E[d{X,ay] = [ \dix,a)\''dn{x) < [ Krdn{x) = MXAf) < oo, 
Jx JX 

by the linearity of the Lebesgue integral, and the fact that /x is a probability 
measure. Surprisingly, Ziezold (1977) originally assumed that E[rf(X, aY\ < oo 
for at least one a X. This condition, however, is redundant, since this author 
also assumed that the metric on X is bounded and that the sample space of 
interest is a probability space. Although this only makes X a bounded metric 
space, and not a totally bounded metric space (i.e. X may not be covcrablc by a 
finite number of open balls with finite radii) , the boundedness of d is nonetheless 
sufficient for deducing the integrability of d^. 

Remark 3. By contrast, the integrability of the exponentiated metric was 
not explicitly assumed by Sverdrup-Thygeson (1981). This author, however, 
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assumed that X is compact, which imphes that d"" is integrable for any fi- 
nite r > 1. Indeed, eonipact metric spaces are totally bounded and there- 
fore have bounded metric. If d{x,y) < diam(A') for every x,y G X, then 
d{x,yY < diam(A')'' < oo, and it therefore follows that J-^ \d{x,y)\^d^{x) < 
Jj^ diam{X)d^{x) = disim(X) fi(X) , for every r > 1, using again the linearity of 
the Lebesgue integral. Thus, if X is compact, d is r-integrable with respect to 
any measure satisfying ij,{X) < oo. 

The key to the proof of theorem 1 is based on a classical result, due to Rao 

(1962), which stipulates the conditions under which the weak convergence of a 
probability measure is equivalent to the uniform convergence of a probability 
measure, in a sense made clear in theorem 2. This can be seen as a generalization 
of the Glivenko-Cantelli lemma to random variables taking values in separable 
metric spaces (see also Parthasarathy, 1967, chap. 2). In this result, we will need 
to define a class of functions on the separable space X, which we will denote 
by := T{X), whereby every / G is a real- valued contimious function that 
satisfies f : X i-^ R. Such a class of functions is said to be uniformly bounded 
when for every f G J^, and every x G X, there exists an M e M, such that 
/(x) < M. In addition, J" is equicontinuous at a point xq €E X, if for every 
e > 0, there exists d{xo) > 0, such that for every u G Ns{xo) := {u & X : 
d{xo,u) < S}, we have \f{x) — f{u)\ < e, for every f G J^. The class J" is 
said to be equicontinuous if it is equicontinuous for every x € X. Finally, T is 
said to be uniformly equicontinuous if 6 does not depend on Xq- We will denote 
the collection of all finite measures on B by A4{B), and will indicate weak 
convergence. 

Theorem 2 (Rao, 1962, p. 672). LetT{X) be a class of real-valued functions on 
a separable space X, and assume that J^iX) is (i) dominated by a continuous 
integrable function on X, and that (ii) T{X) is equicontinuous. If, for some 
sequence of measures € A4{B), and fj, G A4{B), we have /i„ => fi, a.s., then 

lim sup 

n-i-cx) f^jr 

The following lemma will be used in the proof of theorem 1 . This result links 
the properties of a bounded metric space with the conditions required in Rao's 
(1962) theorem. For this purpose, we will require the following classes of point 
functions on a metric space (see Searcoid, 2007). 

Definition 1. For any metric space (X,d), the z -point function is defined as 
dz{x) := d{z,x) for every x G X. The class of point functions on {X,d) is then 
denoted by V{X) :— {dz : ^ z G X}. Similarly, we will make use of the class of 
exponentiated point functions, defined as follows, 

V{X) —{dl: y zgX}. 

for every finite r >1, and where elements in either V orV^ will be denoted by 
dz, and d^, respectively, or simply z, when there is no ambiguity. 
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fdiJ-n - / fdiJ. 



0, a.s.. 



Lemma 3. If{X, d) is a hounded metric space, then 'D'^{X) is uniformly hounded 
and uniformly equicontinuous for every finite r > 1. 

Proof. By the boundcdncss of rf), there exists an M G M, such that d(x, y) < 
M, for every x,y G X. Therefore, dz{x) < M, for every x & X, for every dz G 2?, 
and thus V is uniformly bounded. Moreover, since dl{x) < M'^ < oo, for every 
finite r > 1, it follows that each T)^ also forms a uniformly bounded class of 
functions. Next, by the reverse triangle inequality, we have \dz{x) — dz{xQ)\ < 
d{x, a;o), for all a;, xo, 2; e A', thereby proving the (uniform) equicontinuity of the 
class V on X. For the case of r > 1, we consider the exponentiated version of 
the triangle inequality. Using the binomial expansion, we obtain 



d{z,xy < ^d{z,xo) + d{xo,x)j 



r-l 

' r 



fe=l 



d{z, XoY + ( I. ) d{z, XqY ''d{xo, xY + d{xo, xY, 



and similarly, for any given a;o € X, we have d{z,xoY ^ d{z,xY + 
J2k=i {k)d{z^^y~''d{x,xoY + d{x,XoY- Combining these two inequalities and 
invoking the symmetry of d we have 



r-l 



\d{z,xY -d{z,xoY\ < d{xo,xY + d{xo,x)Ar-^J2 
< d{xo,x)M'^-^ 1 l + fc 



fc=i 

r-l 



fe=l 



where M is the uniform bound on the class T>. Now, choose 6 = e/^M'^ ^ , where 
7 := 1 + EI=1 (D' such that if d(x,2;o) < 5, then \dl{x) - d^(xo)| < -iSM"-^ = 
e, for every x G Ns{xq)., for every G 2?'', thence proving the equicontinuity of 
T)'^ at xq. Since 5 did not depend on the choice of a;o, it follows that T)'^ is also 
uniformly equicontinuous. □ 

Proof of Theorem 1. Observe that the theorem is trivially verified if = 0. 
Thus, assume that is non-empty. We here adopt the line of argument followed 
by Sverdrup-Thygeson (1981). However, since we are not assuming compactness, 
there arc several aspects of Svcrdrup-Thygeson's proof that becomes somewhat 
delicate. In the sequel, we will make use of the following quantities formulated 
with respect to the class of point functions described in definition 1. For every 
z £ X, let 

n . 

^"(^) --=^11 '^'^(^') - / dl{x)dn{x), (8) 
^n{z) := - E<(^') - / d<>{x)dl^{^)- (9) 



and similarly. 
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Since T„(a;) is real- valued, one can invoke the strong law of large numbers for 
real- valued random variables, which gives 



Tn{z) 0, a.s., y zgX. 



(10) 



Note, however, that since we have used infima in the definitions of the Frechet 
theoretical and sample means in equations (1) and (2), it follows that the con- 
vergence of Tn{z) — > is not assured when z is an element of Q or an element 
of <dn- However, as established in lemma 3, the class of point functions, T>^{X), 
is uniformly bounded and (uniformly) equicontinuous. Moreover, in remark 2, 
we have seen that E[(iJ(X)] < oo is implied by the boundedness of d. Thus, 
it follows that there exists a continuous integrable function, i.e. f{x) := M, 
dominating every d!! G T>^ . Moreover, a classical result on the convergence of 
empirical measures based on iid random variables taking values in separable 
metric spaces (see Parthasarathy, 1967, theorem 7.1, p. 53) implies that 



ji, a.s.. 



(11) 



where /i„ := J^"^ X]"=i ^^i' ^^'^ empirical measure on X. Therefore, we are 
in a position to apply theorem 2, which shows that the empirical measure, /x„, 
converges uniformly with probability 1. That is. 



sup 



n ■ 



1 dl{x)diJ.{x) 





Ix 





which may be re-written as 



sup |T„(z)| = sup |T„(z)| 



a.s.. 



zee 



(12) 



Consequently, r„(^„) — > 0, a.s., and T„(^) 0, a.s., for every ^„ e 6„ and 
every 6 Q, respectively. 

Further, from the definition of On and 9, we can 'sandwich' T*{On) in the 
following manner. Firstly, observe that by the minimality of the ^'s, 



1 " r 

Tn{On) = - E'^ii^^) - / dl{x)dn{x) 
i=l •'■^ 

n „ 

^;;E^i(^*)-/ di{x)dn{x) = T:,{0n). 

Secondly, by the minimality of the ^„'s, we similarly have, 

1 " r 

^n{Sn) = -^Y.'^l{Xi)- dl{x)df,{x) 

<-V4(xO- / di{x)df,{x) = Tn{e). 



(13) 



(14) 
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Thence, combining equations (13) and (14), we obtain. 



Tn(On)<T:i{en)<Tn{0), 

such that, using equation (12), 

|T:(4)| < niax{|T„(e'„)|, |T„(^)|} ^ 0, a.s., (15) 

which proves the a.s. convergence of ctJ^ to a'^. 

We now turn to the convergence properties of the Frechet sample mean of 
the r*'^ order, Q"!^. Here, we generalize Ziezold's (1977) proof strategy to Frechet 
sample means of any order (see also Molchanov, 2005, p. 185). Choosing 

e Limsup O^, 

it then suffices to show that 6 S which is verified if 'E,[d{X,0Y] < 

E[d{X,x'y], for every x' £ X. We proceed by constructing the following subse- 
quence of natural numbers. 

Observe that from the definition of the Kuratowski upper limit and the 
equivalence relation reported in lemma 2, it follows that e C1(U^^^65„), 
for every n, where Cl(-) denotes the closure of a set. Thus, one can con- 
struct a subsequence, {uk ■ k G N}, such that for every k, there exists an 
element 0k € Um=fc ®mi which satisfies d{0k,6) < 1/fc. Moreover, we can define 
Uk ■= min{n G N : n > k,0k & Now, from a standard consequence of 

Minkowski inequality, we have 

(-E^(^-^rj ^[-^/(wj +(-E'^(4,^)'-j , 

which gives 

As A; ^ oo, it then follows from equation (12) that since {nk)ken is a subsequence 
of (n)neNi we obtain 



l/r / 1 \ ^^'^ 

(nd{X,0Y]) '^<liminf — ^d(X„4)n , (16) 

^ ' fe— )-oo \Tlk ^ ^ / 



where liminf is here taken with respect to non-negative real numbers. Moreover, 
by construction, each 0^ is minimal with respect to any element x' G X, such 
that 

- ^ d{X,, 0kY < - ^ d{Xi, x'Y, (17) 



1=1 1=1 
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for every x' €: X and fc e N. Observe that given the continuity and mono- 

tonicity of g{x) := x^l"^ on the positive real numbers, we have hminf g(a:;„) = 
^(liminf a;„), for sequences satisfying x„ € M"*". Therefore, it suffices to combine 
equations (16) and (17) in order to obtain ¥\d{X, QY\ ^ E[(i(X, x')'"], for every 
x' e A", as required. Thence, ^ G 9'" a.s., but since Q was arbitrary, we have 
Limsup 9^ C 9'' 3i.S.^ clS required. □ 



3.1. Extension to Equicontinuous Transformations 

Our proof of theorem 1 crucially relies on the equicontinuity of the functional 
space 'D'^{X). One could therefore extend the previous results by allowing this 
class of functions to represent any continuous transformations of the point func- 
tions, dz{-)- 



4. Restricted Frechet Means 



Theorem 1 can be extended to the case of the restricted Frechet mean. This 
is a concept that was originally introduced and studied by Sverdrup-Thygeson 
(1981). Interest in restricted Frechet means is motivated by the fact that the 
domain of some abstract-valued random variables may be too large to be opti- 
mized in a reasonable amount of time. In such cases, the Frechet sample mean 
may be more suitably defined as one of the elements in the sample at hand. 
That is, consider the following definition of the restricted Frechet sample mean 
and variance, 

n n 

9*''' := argmin d{Xi, x'Y and ct*''" := rain ''^d{Xi,x'Y , 

x'E'X. x'GX 

where X := {Xi, . . . , Xn} C X denotes the set of sampled variables. In practice, 
the sample mean is chosen among the available sampled iid realizations from X. 
In particular, observe that we employed the minimum instead of the infimum in 
the definitions of both 9*''' and a*'"", as the required optimal values necessarily 
exist, albeit they may not be unique. Hence, observe that 9*''" ^ for any n. 
Theoretical analogues of these restricted quantities can be defined as follows, 

9*''" := argmin / d{x,x'Ydii{x), and a*''^ := min / d{x,x'Y dfii^x), 
x'ew J x'ewj 

X X 

where W is the support of /z, denoted supp(/i), which is assumed to be closed. 

Observe that this closure condition is required for ensuring that the Frechet 
mean is contained in supp(/Lt). For notational convenience, we will also assume 
in the sequel that r = 2, and omit that superscript. As previously, the elements 
of 9* and 9* will be denoted by 6'*'s and 6'*'s, respectively. We here prove a 
generalization of a consistency result due to Sverdrup-Thygeson (1981) on the 
a.s. convergences of the restricted Frechet sample mean and variance. Observe 
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that a restricted Frechet sample mean can only converge to a restricted Prechet 
mean. 

Theorem 3. Under the conditions of theorem 1, for every r > 1, and assuming 
that supp(/Lt) is closed, 

CT*''" CT*'"" a.s., and Limsup O*''' C O*'"" a.s.. 

n— >oo 

Proof. Let us denote a quantity analogous to the ones defined in equations (8) 
and (9), but here based on the restricted theoretical Frechet mean, 

1 " r 

TR:(^) := - ^ dl{Xi) - / dl, {x)dn{x), (18) 

where 9* G Q*. We will first demonstrate that 

TR;(a;') - TR;(6'*) 0, a.s.. (19) 

In order to prove this a.s. convergence, we need the following quantity, 

s{S):= sup sup \dl{x)-dl{y)\, (20) 

zeW d(x,y)<5 

where the second supremum is taken over all pairs of elements x,y € W , sat- 
isfying d{x,y) < 6. Since the class of exponentiated point functions on X, de- 
noted V^, was shown to be uniformly equicontinuous in lemma 3, it follows that 
s{5) — > 0, as 5 ^ 0. Moreover, it is straightforward to see that for every 5 > 0, 
we have 



mm 
x'ex 



sup \TRl{x)-TRl{y)\= sup 

!i(a;,y)<i5 





1 


sup 




d(x,y)<S 


n 




1 , 


sup 




d{x,y)<S 


n ' 



-J2d:,ix,)--j2dm 

i—l i=l 



< s{6). 

Next, let Os ■■= {x G X : d{x,9*) < 6}, for any 6 > 0. Since 0* G supp(/i), from 
the definition of the restricted Frechet mean, it follows that ^{0^) =: a > 0. 
Hence, 

n 

P [{Xi e u . . . u {x„ e 0&}] = 1 - Hp [{X, i 04] = 1 - (1 - «)", 

i=l 

which converges to 1, as n — >^ oo, for any a > 0. Moreover, observe that since 
x' e W, for every x' G X, we also have 



limsup min | TR;(a;') - TR;(6I*)| < s{5). 
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It then suffices to let 5 — >■ 0, in order to obtain equation (19). Now, from the 
definitions of TR^ and T„, it can be seen that TR;(6»*) = T„(6»*), and therefore 

TR:(^:) = min TR;(a;') < T^{6*) + min |TR;(a;') - TR:(r)|, 

by the optimality of ^* . This can be bounded below by using the minimality of 
9*, such that 

1 " r 

Tn0:) = -J2d''siX,)- dlix)df,{x) 

1 " r 

^-E^^«- / rf^.(^MMW = TR:ra. 

Combining the last two results, we obtain the following 'sandwich' inequality of 

T„(^;) < TR;(^;) < T„(r) + min |TR;(a;') - TR;(r)|. 

Thence, this gives a.s., 

I TR:(^:)| < max{|T„(^:)|, |T„(r)| + min |TR:(a;') - TR;(r)|} ^ 0, 

using the strong law of large numbers on T„(0*), and using equation (19) for 
the second term in the maximum. This proves that a„ a, a.s.. The proof 
of Limsup O* C 9* with probability 1, can be conducted using the same con- 
struction described in the proof of theorem 1, by choosing 0* G Limsup 6* , and 
noting that supp(;u) was assumed to be closed. □ 

Remark 4. The use of uniform equicontinuity in the proof of theorem 3 requires 

special mention. Svcrdrup-Thygcson (1981) was able to invoke the continuity of 
s{S) with respect to 6 in equation (20) by using the compactness of X. Here, 
this property immediately follows from the uniform equicontinuity of the class of 
exponentiated point functions, ^^{X). This was the sole argument in the proof 
of Sverdrup-Thygeson (1981) for the a.s. convergence of the restricted Frechet 
sample mean that required the compactness of X. Hence, the boundedness of d 
constitutes a sufficient condition. 

Remark 5. Under our assumptions and the ones postulated by both Ziezold 
(1977) and Sverdrup-Thygeson (1981), there is no guarantee that 6 C supp(X) 
holds, as assumed in the definition of the restricted Prechet mean. In partic- 
ular, one can easily construct a measure space where belongs to a set of 
/z-measure zero. Consider the random variable described in example 2, where 
two point masses were located at —1 and 1, respectively, and the Prechet mean 
was computed with respect to the square of the 'Manhattan' distance. Clearly, 
the Frechet mean is located in the barycenter of the interval [—1,1] but that 
center of mass does not belong to supp(X). 
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One of the considerable advantages of using the restricted Prechet sample 
mean is that its Kuratowski outer limit necessarily converges to a non-empty 
subset of X. A generalization of the Bolzano- Weierstrass compactness theorem 
to set-valued analysis states that any sequence of subsets of a separable metric 
space converges to a (possibly empty) limit set (Aubin and Frankowska, 2009). 
In a similar fashion, we here make use of the classical Bolzano- Weierstrass the- 
orem for real numbers in order to show that the Kuratowski outer limit of 
the restricted Frcchct sample mean in separable bounded metric spaces is non- 
empty with probability one. That is, we use the fact that the sequence of real- 
valued distances converges in order to deduce the almost sure non-emptiness 
of Limsup ©*'''. Observe that a similar statement would not hold for the unre- 
stricted Frechet sample mean, as there is no guarantee that each element in the 
sequence of unrestricted Prechet sample means is non-empty. 

Theorem 4. Under the conditions of theorem 1, and assuming that supp(/i) is 
closed, for every r > 1, we have ©*■'' ^ and 



e fl : Limsup 6*''' (w) 



= 1. 



Proof. Since supp(/Lt) is closed it follows that both ©* 
empty, as otherwise, there would exist a divergent sequence in supp(/x). For the 
limit superior of the sample Frechet mean, choose an arbitrary reference point, 
Xq & X. Clearly, for every n e N and using the boundedness of the metric on 
X, we have d(9*'''(a;), xq) < M < oo. By the Bolzano- Weierstrass theorem for 
sequences of bounded real numbers, it follows that for almost every cj G fi, 
there exists a subsequence Uk such that d(©*'^(w), a;o) — >■ C, for some constant 
C e K+. 

Moreover, since the OJ^''''s are non-empty, it follows that one can choose a 
sequence Zk & ©n^^, for every fc S N, such that Zk — > xi with d{xi,xo) = C. This 
is always possible, since d(©*'^'"(w), ccq) — >■ C. However, in that case, we obtain 
the following upper bound through an application of the triangle inequality, 

(§:■:, xi) < d{e:':,z,) +d{zk,x^), 

for every k. Taking the limit inferior, this gives, 

liminf d ( 9* 'J' ,xi] < liminf d ( ©* ; ,Zk)+ hminf d(zk ,xi), 

where both terms on the right-hand side cancels out, since the Zk belongs to the 
Prechet sample means and Zk xi. Thus, xi G Limsup 0*'' (w), and this holds 
for every oj gQ. □ 

5. Metric Squared Error (MSE) Convergence 

The convergence of an A'-valued random variable with respect to d^ is here 
denoted by 

Xji X, 
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which signifies that 

lim E[d{X„{ij),XY] = 0. 

n—>oo 

Observe that we are here requiring that the limit of E[d(X„(w), X)''] is null, 
which is a stronger condition than what we have considered thus far, when 
evaluating the Kuratowski outer limit of the Frechet sample mean in section 2. 
It will be shown, however, that such a stronger condition is satisfied, when both 
and the 6^'s are non-empty. 

5.1. Properties of cT -Consistency 

Equipped with this mode of convergence, we will be especially interested in 
considering the ^''-convergence of a sequence of empirical Frechet means to the 
corresponding theoretical Frechet mean. In this case, we will refer to this mode 
of convergence as the cF - consistency of a random empirical Frechet mean, with 
respect to a fixed theoretical Frechet mean. 

Definition 2. The Frechet sample mean, 0„, is said to be d'' -consistent, for 

some r > 1, with respect to the Frechet mean, 6, when ©„ — >■ 9. The case of 
r = 2 will be referred to as metric squared error (MSB) convergence, and the 
MSB is defined as follows, 

MSEd(e„) :=E[d(e„,e)2], 

where the expectation is taken with respect to the n random variables in 0„. 

The MSE is clearly reminiscent of the classical mean squared error for 
real-valued random variables. This analogy is strengthened by the fact that 
a straightforward upper bound can be derived for the MSE, which replicates 
the classical decomposition of the mean squared error for estimators of real- 
valued random variables. In order to derive this decomposition, it will be useful 
to introduce the concept of the Frechet expectation of the sample estimator that 
we have considered so far. This expectation operator can be formally defined as 
follows, for any r > 1, 

F^[e„] :=arginf [■■■ [ d(e„(x), a;')''(iM"(x), 

where /z" is the complete product measure on A"" and 6^(x) is not necessarily 
the Frechet sample mean, but could be any sample estimator, which is a function 
of X. Using this notation, we can generalize the standard notion of the bias of a 
given estimator, as the supremum of the distance between the target parameter 
and its sample estimator. Thus, the rf-bias is formally defined as follows 

6S(e„) := sup{d(4, e)"- : 6n € Fd[e„]}, (21) 
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for any r > 1, where for notational convenience, we have assumed that both F[-] 

and Frcchct means arc taken with respect to r = 2. Similarly, the variance of 
an estimator can be constructed in the following manner, 

VarS(e„):= inf E[d(e„(X),a:')1, 

where the expectation is taken with respect to the n-dimensional random vec- 
tor X G X". Equipped with these definitions, we can immediately derive a 
generalization of the classical decomposition of the mean squared error. 

Lemma 4. Given a sample estimator 6„ of O, we have 

MSEd(e„) < 2Vard(e„) + 2&2(e„). 

Proof. Since (A", d) is a metric space, we can invoke the triangle inequality on 
rf(6„,©) with respect to an element 9n € Fd[0„], which gives 

It then suffices to take the expectation over X, in order to obtain 

MSEd(e„) < 2¥.[d{Qn,kf] +2d(4,e)2 

< 2Var<i[e„] + 2sup{d(^„,e)2 : ^„ e [§„]}, 

where the latter inequality follows from the fact that 6'„ is an argument minimiz- 
ing E[d(6„, x')"^], for the first term; and through an application of the definition 
of the d-bias in equation (21), for the second term. □ 

Using the notation MSE^(©„) := E[d(6„, ©)''], lemma 4 can be readily gen- 
eralized for any r > 1 to 

MSES(e„) < 2'-i(VarS[e„] +6S(e„)), 

using the generalized triangle inequality (see Prechet, 1948, p. 228). As for the 

standard convergence in r*^ mean of real- valued random variables, convergence 
in implies convergence in when r > s, as described in the following lemma. 

Lemma 5. Given any sequence of X -valued random variables, X„ and X, if 
Xn ^ X, then X„ ^ X, where r > s>l. 

Proof. By the Lyapunov's inequality (Grimmett and Stirzaker, 2001), we have 
E[|^|s]i/s < E[|Z|'"]^/'', for any real- valued random variable Z and for every 
r > s > 1, and therefore 

(E[d(X„(a;),X)«])'^' < (E[d(X„(a;),X)n)'^\ 

also holds for any abstract- valued random variables and every r > s > 1. The 
result follows by taking the limit with respect to n, on both sides, and noting 
that since X„ converges to X in d^ , it follows that E [d{Xn{(jj), Xy] converges 
to as n — >■ oo. □ 
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5.2. MSE Consistency of Frechet Sample Mean 



The MSE consistency of the Frechet sample mean in separable bounded metric 
spaces is strong provided that the target Frechet mean and the elements of 
the sequence of empirical Frechet means are non-empty. This holds for Frechet 
means of all orders, i.e. for every r > 1, which can be shown to be d^-consistent, 
for every s > 1. This theorem naturally follows from the previous results stating 
the a.s. convergence of this estimator, the properties of the Kuratowski outer 
limit and an application of the bounded convergence theorem. 

Theorem 5. Assume that the conditions of theorem 1 hold and that in addition, 
0'',0^ ^ 0, for every n. Then, for every r,s > 1, it follows that we have 

Proof. By theorem 1, Limsup Qni^) — ' ^-^-^ implies that every subsequence 
of Qni^) converges to a subset of 6'', and thus it immediately follows that 

lim (i(e;(u;),e'^V = 0, a.s.. 



for every finite s > 1, using d{A,B) := mf{d{x,y) : x G A,y G B}, for every 
A,B C X. In addition, observe that since {X, d) is a bounded metric space, we 
have a'(e;(w),e'') < M, and therefore d(e;(u;), 6'')" < Af" < oo, for every 
finite r, s > 1, n € N and w G f2. Thus, we can invoke the bounded convergence 
theorem, in order to take the expectation over the space O, 

lim E [d(e;(w),e'^)"] =0, 

for every finite s > 1, and this completes the proof. □ 

Remark 6. A substantial advantage of this particular mode of convergence is 
that it automatically controls for the 'emptiness' of the Kuratowski outer limit of 
the sequence of Frechet sample means. That is, if the outer limit of the sequence 
of empirical Frechet mean sets is solely the empty set, then such a sample 
estimator will fail to be (^''-consistent. By contrast, the mode of convergence 
studied in section 3 would treat a sequence of sets with empty outer limit as 
a.s. consistent. This follows from the fact that C Q, for any © C Af. In such 
cases, a.s. consistency does not imply ^''-consistency. 

Remark 7. Conversely, observe that for this particular mode of convergence, 
the Frechet sample mean need not be entirely included in the theoretical mean. 
That is, the Frechet sample and theoretical means solely need to have a non- 
empty intersection. One may encounter the situation where 9„ fl © ^ and 
0„ ^ © occur, for infinitely many n. In this case, it follows that rf(0„, ©) — )• 0, 
as n ^ oo, and therefore the Frechet sample mean is rf^-consistent, even though 
this docs not imply a.s. consistency. 

Thus, from remarks 6 and 7, it follows that a.s. convergence of the Limsup 
of the Frechet sample mean and (^''-convergence do not in general imply each 
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other. We close this section with an immediate corollary ascertaining the d""- 

convergencc of the restricted Frcchct sample mean. The proof strategy used 
for theorem 5 can directly be employed in this setting. Here, however, we do 
not require to assume the non-emptiness of the restricted Prechet mean or the 

elements of the sequence of empirical restricted Frechct means, since the support 
of the underlying measure, /i, is already assumed to be closed. 

Corollary 1. Under the conditions of theorem 3, for every r > 1, 6*'"" ^ S*'*", 
for every finite s > 1. 

6. Conclusion 

In this paper, we have generalized the results due to Sverdrup-Thygeson (1981) 
by relaxing the compactness assumption made by this author. This task has 
highlighted interesting links between the Sverdrup-Thygeson's proof and an- 
other classical proof of the a.s. convergence of the Frechet sample mean, due to 
Ziezold (1977). In particular, we have shown that by assuming the boundedness 
of the metric of interest, we can deduce the uniform boundedness and uni- 
form equicontinuity of any family of point functions on X. These two properties 
were found to be required on two distinct occasions when proving asymptotic 
convergence results for the unrestricted and restricted Frechet sample means, 
respectively. In the original proof of Sverdrup-Thygeson (1981), these two ar- 
guments rely on compactness, thereby showing that uniform boundedness and 
uniform equicontinuity constitute appropriate weaker assumptions. 

Throughout, we have assumed that the underlying metric of interest is a full 
metric. However, as was originally done by Ziezold (1977), it can be shown that 
our results also hold for bounded pseudo-metrics, where one relaxes the axiom 
of coincidence. In this case, d{x, y) = does not necessarily imply that x = y. 
It is easy to check that this particular property was not used in this paper, 
and therefore that all the aforementioned convergence theorems remain valid 
for Frechet sample mean sets defined over separable bounded pseudo-metnc 
spaces. 
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